The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. RAG works great for explicit retrieval tasks but fails during query focused summarization (QFS) tasks such as global questions directed at an entire text corpus.
To address this, Microsoft Research has come up with a new approach, GraphRAG, which uses the LLM to create a knowledge graph based on the private dataset. GraphRAG approach uses an LLM to build a graph-based text index in two stages: (1) a knowledge graph from source documents is created, (2) then generating summaries for groups of closely-related entities. When a question is posed, these summaries are used to create partial responses, which are then combined into a final answer for the user.
A typical Graph RAG pipeline uses an LLM-derived graph index of source document text. This index spans nodes (e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected, extracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection is used to partition the graph index into groups of elements (nodes, edges, covariates) that the LLM can summarize in parallel at both indexing time and query time. The “global answer” to a given query is produced using a final round of query-focused summarization over all community summaries reporting relevance to that query.
Unlike previous methods that utilize the structured retrieval and traversal capabilities of graph indexes, Graph RAG prioritizes an untapped aspect of graphs: their built-in modularity. It emphasizes the capacity of community detection algorithms to divide graphs into cohesive clusters of closely-connected nodes. LLM-generated summaries of these community descriptions provide complete coverage of the underlying graph index and the input documents it represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: first using each community summary to answer the query independently and in parallel, then summarizing all relevant partial answers into a final global answer.